{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "83ff2437",
   "metadata": {},
   "source": [
    "<p align=\"center\">\n",
    "    <img src=\"https://github.com/GeostatsGuy/GeostatsPy/blob/master/TCG_color_logo.png?raw=true\" width=\"220\" height=\"240\" />\n",
    "\n",
    "</p>\n",
    "\n",
    "## Spatial Data Analytics \n",
    "\n",
    "### Interactive Demonstration of Variogram h-Scatterplots \n",
    "\n",
    "#### Michael Pyrcz, Associate Professor, The University of Texas at Austin \n",
    "\n",
    "##### Contacts: [Twitter/@GeostatsGuy](https://twitter.com/geostatsguy) | [GitHub/GeostatsGuy](https://github.com/GeostatsGuy) | [www.michaelpyrcz.com](http://michaelpyrcz.com) | [GoogleScholar](https://scholar.google.com/citations?user=QVZ20eQAAAAJ&hl=en&oi=ao) | [Book](https://www.amazon.com/Geostatistical-Reservoir-Modeling-Michael-Pyrcz/dp/0199731446)\n",
    "\n",
    "This a simple demonstration of the variogram h-scatterplot for a 1D datasets with variable spatial continuity and visualization.\n",
    "\n",
    "* we will see the correlogram (equal to the covariance function when the sill, variance is 1.0) is the correlation coefficient of the h-scatterplot. \n",
    "\n",
    "* there is some deviation due to the lag effect, the edge effect with variogram calculation that excludes some of the data (e.g., at large lags only the samples at the edges of the area of interest are included in the pairs)\n",
    "\n",
    "* we will perform the calculations in 1D for fast run times and ease of visualization.\n",
    "\n",
    "Please cite this code as:\n",
    "\n",
    "Pyrcz, Michael J. (2021). PythonNumericalDemos: Educational Data Science Demonstrations Repository (1.0.0). Zenodo. https://doi.org/10.5281/zenodo.5564991\n",
    "\n",
    "#### Load the required libraries\n",
    "\n",
    "The following code loads the required libraries."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "812f4f78",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "C:\\Users\\pm27995\\Anaconda3\\lib\\site-packages\\numpy\\_distributor_init.py:30: UserWarning: loaded more than 1 DLL from .libs:\n",
      "C:\\Users\\pm27995\\Anaconda3\\lib\\site-packages\\numpy\\.libs\\libopenblas.GK7GX5KEQ4F6UYO3P26ULGBQYHGQO7J4.gfortran-win_amd64.dll\n",
      "C:\\Users\\pm27995\\Anaconda3\\lib\\site-packages\\numpy\\.libs\\libopenblas.XWYDX2IKJW2NMTWSFYNGFUWKQU3LYTCZ.gfortran-win_amd64.dll\n",
      "  warnings.warn(\"loaded more than 1 DLL from .libs:\"\n"
     ]
    }
   ],
   "source": [
    "import numpy as np                                          # arrays\n",
    "import matplotlib.pyplot as plt                             # plotting\n",
    "import pandas as pd                                         # dataframes\n",
    "from geostatspy import GSLIB                                # affine correction\n",
    "from ipywidgets import interactive                          # plotting widgets and interactivity\n",
    "from ipywidgets import widgets                            \n",
    "from ipywidgets import Layout\n",
    "from ipywidgets import Label\n",
    "from ipywidgets import VBox, HBox"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c46a4b7c",
   "metadata": {},
   "source": [
    "#### Set the working directory\n",
    "\n",
    "I always like to do this so I don't lose files and to simplify subsequent read and writes (avoid including the full address each time).  Also, in this case make sure to place the required (see below) data file in this working directory. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "90b55ab1",
   "metadata": {},
   "outputs": [],
   "source": [
    "#os.chdir(\"C:\\PGE337\")                                      # set the working directory"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "63fed3ff",
   "metadata": {},
   "source": [
    "#### Load the Dataset\n",
    "\n",
    "It is a small 1D dataset available on my GitHup [GeoDataSets](https://github.com/GeostatsGuy/GeoDataSets) repository.\n",
    "\n",
    "Datasets are cited as: \n",
    "\n",
    "* Pyrcz, Michael J. (2021). GeoDataSets: Synthetic Subsurface Data Repository (1.0.0). Zenodo. https://doi.org/10.5281/zenodo.5564874"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "e4439445",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Depth</th>\n",
       "      <th>Nporosity</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.25</td>\n",
       "      <td>-1.37</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.50</td>\n",
       "      <td>-2.08</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.75</td>\n",
       "      <td>-1.67</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1.00</td>\n",
       "      <td>-1.16</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1.25</td>\n",
       "      <td>-0.24</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Depth  Nporosity\n",
       "0   0.25      -1.37\n",
       "1   0.50      -2.08\n",
       "2   0.75      -1.67\n",
       "3   1.00      -1.16\n",
       "4   1.25      -0.24"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.read_csv('https://raw.githubusercontent.com/GeostatsGuy/GeoDataSets/master/1D_Porosity.csv')\n",
    "npor = df['Nporosity']\n",
    "npor = GSLIB.affine(npor,0.0,1.0)                           # ensure variance is 1.0 for results to work below\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d61e9599",
   "metadata": {},
   "source": [
    "Notice that we ensured that the dataset variance is 1.0 as we assume this to calculate the correlogram below.\n",
    "\n",
    "#### Interactive Interface\n",
    "\n",
    "Here's the interactive interface. I calculate the variogram, plot the h-scatterplot and calculate and annotate the correlogram / h-scatterplot correlation coefficient.  \n",
    "\n",
    "* the user specifies lag to investigate"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "ba7230a6",
   "metadata": {},
   "outputs": [],
   "source": [
    "l = widgets.Text(value='                                                     Variogram h-Scatterplot Demonstration, Prof. Michael Pyrcz, The University of Texas at Austin',\n",
    "                 layout=Layout(width='970px', height='30px'))\n",
    "\n",
    "lag = widgets.IntSlider(min=1,max = 40,value=5,step = 1,description = 'Lag',orientation='horizontal',style = {'description_width': 'initial'},layout=Layout(width='980px',height='30px'),continuous_update=False)\n",
    "\n",
    "ui2 = widgets.VBox([l,lag],)\n",
    "\n",
    "def run_plot(lag): \n",
    "    size = 0.25\n",
    "    gamma = np.average(np.square((npor - npor.shift(lag)).dropna()))*0.5\n",
    "    gamma_all = []; num_pairs_all = []\n",
    "    for ilag in range(0,40):\n",
    "        num_pairs_all.append(len((npor - npor.shift(ilag)).dropna()))\n",
    "        gamma_all.append(np.average(np.square((npor - npor.shift(ilag)).dropna()))*0.5)    \n",
    "    \n",
    "    npor_shift = npor.shift(lag)\n",
    "    correl = np.round(np.corrcoef(npor[~np.isnan(npor_shift)],npor_shift[~np.isnan(npor_shift)]),2)[0][1]\n",
    "        \n",
    "    plt.subplot(121)\n",
    "    scatter = plt.scatter(np.arange(0,40)*size,gamma_all,s=num_pairs_all,color='red',edgecolor='black')\n",
    "    plt.scatter(lag*size,gamma,color='darkorange',edgecolor='black',s=40,zorder=10)\n",
    "    plt.plot([lag*size,lag*size],[0,gamma],color='black',ls='--',zorder=1)\n",
    "    plt.plot([lag*size,lag*size],[1.0,gamma],color='red',ls='--',zorder=1)\n",
    "    if gamma < 1.0:\n",
    "        plt.plot([lag*size,lag*size+0.1],[gamma,gamma+0.1],color='red',zorder=1)\n",
    "        plt.plot([lag*size,lag*size-0.1],[gamma,gamma+0.1],color='red',zorder=1)\n",
    "    else:\n",
    "        plt.plot([lag*size,lag*size+0.1],[gamma,gamma-0.1],color='red',zorder=1)\n",
    "        plt.plot([lag*size,lag*size-0.1],[gamma,gamma-0.1],color='red',zorder=1)\n",
    "    plt.plot([0,lag*size],[gamma,gamma],color='black',ls='--',zorder=1)\n",
    "    plt.plot([0,10],[1.0,1.0],color='black')\n",
    "    plt.xlim([0,10]); plt.ylim([0,3]); plt.title('Experimental Variogram')\n",
    "    plt.annotate(r'$\\gamma(\\bf{h}) =$ ' + str(np.round(gamma,2)),[lag*size+0.2,(gamma)/2])\n",
    "    plt.annotate(r'$\\bf{h} =$ ' + str(lag*size),[lag*size-0.3,0.02],rotation=90)\n",
    "    plt.annotate(r'$\\sigma^2 - \\gamma(\\bf{h}) =$ ' + str(np.round(1.0-gamma,2)),\n",
    "        [lag*size+0.2,(gamma+1.0)/2],color='red')\n",
    "    plt.xlabel(r'$\\bf{h}$ Lag Distance'); plt.ylabel(r'$\\gamma(\\bf{h})$ Variogram')\n",
    "    legend = plt.legend(*scatter.legend_elements(\"sizes\", num=4),loc='upper left')\n",
    "    legend.set_title('Number of Pairs')\n",
    "    \n",
    "    plt.subplot(122)\n",
    "    plt.scatter(npor,npor.shift(lag),color='darkorange',edgecolor='black',s=20)\n",
    "    plt.plot([-3,3],[-3,3],color='black')\n",
    "    plt.xlim([-3,3]); plt.ylim([-3,3]); \n",
    "    plt.title(r'h-Scatter Plot, lag = ' + str(lag) + r', $\\bf{h} =$ ' + str(lag*size))\n",
    "    plt.xlabel(r'$Z(\\bf{u})$ Tail'); plt.ylabel(r'$Z(\\bf{u}+ \\bf{h})$ Head')\n",
    "    plt.annotate(r'$\\rho_{Z(\\bf{u}),Z(\\bf{u} + \\bf{h})}$ = ' + str(correl),[1.0,-2.5],fontsize=12)\n",
    "    plt.subplots_adjust(left=0.0, bottom=0.0, right=2.0, top=1.1, wspace=0.2, hspace=0.3); plt.show()\n",
    "    \n",
    "# connect the function to make the samples and plot to the widgets    \n",
    "interactive_plot = widgets.interactive_output(run_plot, {'lag':lag})\n",
    "interactive_plot.clear_output(wait = True)               # reduce flickering by delaying plot updating    \n",
    "    "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7df92474",
   "metadata": {},
   "source": [
    "### Interactive Variogram h-scatterplot Demonstration \n",
    "\n",
    "#### Michael Pyrcz, Professor, The University of Texas at Austin \n",
    "\n",
    "Change the number of sample data, train/test split and the data noise and observe overfit! Change the model order to observe a specific model example.\n",
    "\n",
    "### The Inputs\n",
    "\n",
    "* **lag** - the lag number to calculate, h = lag $\\times$ data spacing"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "6aa493e8",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "06e096674f0a494888dabb5247f03fbb",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "VBox(children=(Text(value='                                                     Variogram h-Scatterplot Demons…"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "b95feb003a16465ea0872e58145cf0ff",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Output(outputs=({'output_type': 'display_data', 'data': {'text/plain': '<Figure size 432x288 with 2 Axes>', 'i…"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "display(ui2, interactive_plot)                           # display the interactive plot"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5267d3ae",
   "metadata": {},
   "source": [
    "#### Comments\n",
    "\n",
    "This was an interactive demonstration of the variogram h-scatter plot. \n",
    "\n",
    "I have many other demonstrations on simulation to build spatial models with spatial continuity and many other workflows available [here](https://github.com/GeostatsGuy/PythonNumericalDemos), along with a package for geostatistics in Python called [GeostatsPy](https://github.com/GeostatsGuy/GeostatsPy). \n",
    "  \n",
    "We hope this was helpful,\n",
    "\n",
    "*Michael*\n",
    "\n",
    "***\n",
    "\n",
    "#### More on Michael Pyrcz and the Texas Center for Geostatistics:\n",
    "\n",
    "### Michael Pyrcz, Associate Professor, University of Texas at Austin \n",
    "*Novel Data Analytics, Geostatistics and Machine Learning Subsurface Solutions*\n",
    "\n",
    "With over 17 years of experience in subsurface consulting, research and development, Michael has returned to academia driven by his passion for teaching and enthusiasm for enhancing engineers' and geoscientists' impact in subsurface resource development. \n",
    "\n",
    "For more about Michael check out these links:\n",
    "\n",
    "#### [Twitter](https://twitter.com/geostatsguy) | [GitHub](https://github.com/GeostatsGuy) | [Website](http://michaelpyrcz.com) | [GoogleScholar](https://scholar.google.com/citations?user=QVZ20eQAAAAJ&hl=en&oi=ao) | [Book](https://www.amazon.com/Geostatistical-Reservoir-Modeling-Michael-Pyrcz/dp/0199731446) | [YouTube](https://www.youtube.com/channel/UCLqEr-xV-ceHdXXXrTId5ig)  | [LinkedIn](https://www.linkedin.com/in/michael-pyrcz-61a648a1)\n",
    "\n",
    "#### Want to Work Together?\n",
    "\n",
    "I hope this content is helpful to those that want to learn more about subsurface modeling, data analytics and machine learning. Students and working professionals are welcome to participate.\n",
    "\n",
    "* Want to invite me to visit your company for training, mentoring, project review, workflow design and / or consulting? I'd be happy to drop by and work with you! \n",
    "\n",
    "* Interested in partnering, supporting my graduate student research or my Subsurface Data Analytics and Machine Learning consortium (co-PIs including Profs. Foster, Torres-Verdin and van Oort)? My research combines data analytics, stochastic modeling and machine learning theory with practice to develop novel methods and workflows to add value. We are solving challenging subsurface problems!\n",
    "\n",
    "* I can be reached at mpyrcz@austin.utexas.edu.\n",
    "\n",
    "I'm always happy to discuss,\n",
    "\n",
    "*Michael*\n",
    "\n",
    "Michael Pyrcz, Ph.D., P.Eng. Associate Professor The Hildebrand Department of Petroleum and Geosystems Engineering, Bureau of Economic Geology, The Jackson School of Geosciences, The University of Texas at Austin\n",
    "\n",
    "#### More Resources Available at: [Twitter](https://twitter.com/geostatsguy) | [GitHub](https://github.com/GeostatsGuy) | [Website](http://michaelpyrcz.com) | [GoogleScholar](https://scholar.google.com/citations?user=QVZ20eQAAAAJ&hl=en&oi=ao) | [Book](https://www.amazon.com/Geostatistical-Reservoir-Modeling-Michael-Pyrcz/dp/0199731446) | [YouTube](https://www.youtube.com/channel/UCLqEr-xV-ceHdXXXrTId5ig)  | [LinkedIn](https://www.linkedin.com/in/michael-pyrcz-61a648a1)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "44aabde3",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
